Analyzing and Visualizing Ford GoBike System Data (February 2019)¶

presented by Sherif Elghazawy¶

Table of Contents¶

  • Introduction
  • Investigation Overview
  • Dataset Overview
  • Univariate Exploration
  • Bivariate Explorationn
  • Multivariate Exploration
  • Summary
  • Sources

Introduction¶

Ford GoBike (also known as Bay Wheels system) is a regional public bicycle sharing system in California's San Francisco Bay Area and was introduced in 2013 as a pilot program for the region, with 700 bikes and 70 stations across San Francisco and San Jose.

Ford GoBike consists of a fleet of specially designed, environment-friendly and durable bikes that are locked into a network of docking stations throughout the city. The bikes can be unlocked from one station and returned to any other station in the system, making them ideal for one-way trips.

People use bike share to commute to work or school, run errands, get to appointments or social engagements and more. It's a fun, convenient and affordable way to get around.

The bikes are available for use 24 hours/day, 7 days/week, 365 days/year and riders have access to all bikes in the network when they become a member or purchase a pass.

In June 2017, the system was officially re-launched as Ford GoBike in a partnership with Ford Motor Company. After Motivate's acquisition by Lyft, the system was renamed to Bay Wheels in June 2019.The system is expected to expand to 7,000 bicycles around 540 stations in San Francisco, Oakland, Berkeley, Emeryville, and San Jose.

Investigation Overview¶

In this investigation of the Ford GoBike System, I would like to explore the most influential customer behaviors and characteristics, such as user type, gender and age, as well as features like ride duration, timing, distance, stations, and whether the whole ride used GoBike bike or not. Finally, I will investigate how these attributes impact the usage of Bay Wheels system.

Dataset Overview¶

  • The data consisted of 183,412 rows and 16 attributes for 183,412 bike rides. The attributes include the ride statistics, such as ride duration, ride start and end time, and station information and coordinates, user information, such as user type, birth day, and gender, as well as additional features such as bike id and bike share. 17,318 missing data points were imputed, rather than being removed from the analysis as these rows, including these NaN values, are used in analyzing and ploting other attributes of interest.

  • After wrangling the dataset, the total number of columns has become 27, instead of 16, as we have extracted 11 attributes to facilitate our exploratory analysis and visualization.

Exploratory Analysis and Visualization¶

What is/are the main feature(s) of interest in your dataset?¶

The main features that I am most interested in include ride durations, ride times, ride stations and distance, and user types and characteristics. My investigation is primarily to explore the patterns and correlations in these features and how the user behaviors and characteristics in relation to these features influence the usage of GoBike system. I will explore the dataset with a goal to answer the rsearch questions: When are most trips taken in terms of time of day, day of the week, or month of the year?, How long does the average trip take? and Does the above depend on if a user is a subscriber or customer?

What features in the dataset do you think will help support your investigation into your feature(s) of interest?¶

The supportive features that can facilitate the investigation and exploration of the features of interest include:

  • User characteristics in terms of user type, member gender, and member age can help explore how these information is associated with other features.
  • Ride duration in terms of seconds, minutes and hours can support in identifying how long a trip takes.
  • Ride start and end times in terms of hour, day, weekday, and month can support in identifying the timing of most trips.
  • Ride stations and distance can help explore the major stations riders travel from/to and the average distance they travel.
  • Bike id and bike share can help identify if there is a specific pattern of bike usage.

Univariate Exploration¶

In this exploration, I will investigate and plot the data distribution of the most influential variables in the dataset individually, one feature at a time.

1. Ride Duration¶

1.1 Duration in seconds:¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • Durations/second is highly skewed and clustered to the left in a long-normal distribution and most of the data falls below 10k seconds for more than 175k of the rides.
  • Changing the x-axis limit shows that data is clustered in position between 7.5k-10k seconds.
  • Accordingly, I have decided to transform the x-axis to explore the data distribution in depth as follows.

Insights:¶

  • Transforming the x-axis shows that data is normally distributed with a little bit skewnees to the right with a mean of 726 seconds.
  • Most of the rides has a duration between 300-1000 seconds and 95% of the rides has a duration below 1571 seconds.

1.2 Duration in minutes¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • Duration/minute follows the same distribution of duration/second as it has been actually extracted from it. It has a similar long-normal distribution and skewness to the right. Most of rides (175K) has <150 minutes in duration.
  • Scaling the x-axis showed up that most of the data is actually custered in the same position > 150 minutes.
  • Again, I have decided to transform the x-axis to explore the data distribution in depth as follows.

Insights:¶

  • Transforming the x-axis shows that data is normally distributed with a mean of 12 minutes.
  • 95% of the rides has a duration below 26 minutes.

1.3 Duration in hours¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • Duration/hour follows the same distribution of duration/second as it has been actually extracted from it. It has a similar long-normal distribution and skewness to the right. Most of rides (175K) has <2.5 hours in duration.
  • Scaling the x-axis showed up thay data actually is custered in the same position > 2.5 hours.
  • Similarly, I have decided to transform the x-axis to explore the data distribution in depth as follows.

Insights:¶

  • Transforming the x-axis shows that data is highly skewed to the right with a mean of 2 hours.
  • 95% of the rides has a duration below 3.3 hours.

2. Ride Start and End Times¶

2.1 Start Time Hour and End Time Hour¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • Start time hour and end time hour follow the same bimodel distribution with a very small skew to the left.
  • Most of rides (50K) starts at 8-9 AM in peak one and at 2-6 PM in peak two (30k-40k rides). Similarly, Most of rides (50K) ends at 8-9 AM in peak one and at 2-6 PM in peak 2 (30k-40k rides).
  • I have decided to transform the x-axis to explore the data distribution in depth as follows.

Insights:¶

  • Transforming start and end hours shows that they follow the same bimodel distribution skewed to the left.
  • Most of rides (15k-25k) starts at 8-11 AM in peak one and at 1-9 PM in peak two (15k-35k rides). Similarly, Most of rides (20K) ends at 8-11 AM in peak one and at 1-9 PM in peak 2 (15k-35k rides). 50% of rides start before 2 PM and mean start hour is 1.45 PM and 50% of rides ends before 2 PM and mean end hour is 1.6 PM.

2.3 Start Time Day and End Time Day¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Bar Chart and Histogram

Insights:¶

  • Start time days with higher number of rides are 5-7, 11-12, 14-15, 19-22, 25, and 27-28 with average number of rides of 8k-10k rides/day.
  • Similarly, End time days with higher number of rides are 5-7, 11-12, 14-15, 19-22, 25, and 27-28 with average number of rides of 8k-10k rides/day.

Insights:¶

  • Sorting the start and end time days horizontally confirms our notes before and as follows:
  1. Start time days with higher number of rides are 5-7, 11-12, 14-15, 19-22, and 27-28 with average number of rides of 8k-10k rides/day.
  2. Similarly, End time days with higher number of rides are 5-7, 11-12, 14-15, 19-22, 25, and 27-28 with average number of rides of 8k-10k rides/day.
  • Moreover, days with lower number of rides are 2-3, 9, and 13.
  • I have decided to explore more how start and end time days are normally distributed to gain more insights.

Insights:¶

  • Start time hours and end time hours follow the same bimodel distribution with a skewness to the left.
  • Most of start time rides occurs in days 4-8, 13-15, 20-22, and 25-28. Similarly, Most of end time rides occurs in days 4-8, 13-15, 20-22
  • I have decided to transform the x-axis to explore the data distribution in depth as follows.

Insights:¶

  • Transforming start and end time days shows that follow the same bimodel distribution skewed to the left.
  • Most of rides starts at days 4-7 in peak one, 10-13 in peak two, and 19-28 in peak three. Similarly, Most of rides ends at days 4-7 in peak one, 10-13 in peak two, and 19-28 in peak three.
  • Mean start and end time rides/day is 6,550.
  • 50% of start and end time rides occured before day 15.

2.4 Start Time Weekday and End Time Weekday¶

  • Variable Type: Qualitative/Categorical
  • Appropariate Plot: Bar Chart and Pie Chart/Waffle Plot

Insights:¶

  • Both start and end time weekdays follow the same distribution.
  • Thursday is the highest weekday in terms of number of rides (35K) while Saturday and Sunday are the lowest days (15k).
  • Working weekdays (26k-35k) are higher in number of rides than weekend days (15k).
  • It seems that riders are commuters, rather than people who use bikes for fun or sports.

Insights:¶

  • Both start and end time weekdays follow the same pie distribution.
  • Thursday is the highest weekday in terms of proportion of rides (19%) while Saturday and Sunday are the lowest days (8%/day).
  • Working weekday (15%-19%/day) are higher in proportion of rides than weekend day (8%/day).
  • Average start and end weekday rides/day is 26,201 and 95% of rides are 34,181 for start time weekday and 34,175 for end time weekday.
  • Again, It seems that riders are commuters, rather than people who use bikes for fun or sports.

2.5 Start Time Month and End Time Month¶

  • Variable Type: Qualitative/Categorical
  • Appropariate Plot: Bar Chart and Pie Chart

Insights:¶

  • All rides start at February (183,412) and most of them ends in the same month of February (183,396), except fo 16 rides ended in March.
  • 99.99% of rides ends in February and 0.01% ends in March.

3. Ride Stations and Distance¶

3.1 Ride Stations¶

  • Variable Type: Qualitative/Categorical
  • Appropariate Plot: Bar Chart and Pie Chart/Waffle Plot

Insights:¶

  • Both start and end stations are skewed to the right with most of rides around the mean of ~ 556 rides/station for both start and end stations.
  • Number of rides handled by end stations (4857 rides) is greater than that of start station(3904 rides).
  • To better understand the distribution of start and end stations, I will explore how stations are clustered and how they are perfroming in each city. So, first I will visulaize their cluster in cities; then, I will plot each city/its stations.

Insight:¶

  • From top left to bottom right of the plots above, the 330 stations are clustered in San Jose, Oakland_Berkeley, and San Fransisco cities.
  • Based on longitude Coordinate, San Jose is located at latitude > -122.1, Oakland_Berkeley at longitude > -122.35 & < -122.1, and San Fransisco at longitude > -122.5 & > -122.35.
  • Accordingly, we can filter stations based on longitude and create subsets of data to mask and visualize the cluster of stations based on their city location. = I will user 'start_station_longitude' to create masks to filter coordinates as both start and end coordinates of stations is roughly the same.

Insights:¶

  • San Fransisco's stations are greater than both those of Oakland_Berkeley and San Jose in all statistics.
  • San Fransisco has 156 stations, Oakland_Berkeley have 127 stations, and San Jose has 47 stations.
  • Avg. number of rides per day in San Fransisco is 4,774, in Oakland_Berkeley is 1,480, and in San Jose is 295.
  • Avg. number of rides per station in San Fransisco is , in Oakland_Berkeley is 1,480, and in San Jose is 295.
  • Avg. number of rides per station/day in San Fransisco is 31, in Oakland_Berkeley is 12, and in San Jose is 6.

3.2 Ride Distance¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • Ride distance distibution is highly skewed to the left with a log-normal distribution as >175k of the rides have a distance ~8km.
  • I have tried to change the scale limit of the distance to better visualize data. However, the data is highly clustered in the same position of ~7km with some outliers.
  • Finally, I have transformed the x-axis to learn more about the distance distribution as shown below.

Insights:¶

  • Now we have a bimodel distribution with two peaks: One from 1-1.5km and 1.5-2.6km.
  • Avg. distance/ride is 1.7km and 95% of rides has a distance of 3.8km.
  • For each distance level that ranges from 1-2.6k, there is > 6k rides/level.
  • There are 9,170 rides that have a distance above 3.8km.

4. Bike Id and Bike Share¶

4.1 Bike Id¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • Total unique number of bike ids is 4,646.
  • Average number of rides per each bike id is 39 rides.
  • Max number of rides/bike id is 191 rides and the min is 1 rides.
  • 90% of bike ids has number of rides below 100 rides and 10% of bike ids has number of rides be 6 rides.
  • Top 10% of bike ids is 465 and lower 10% of bike ids is 565.
  • The distribution of bike ids is skewed to the right with most of bike ids clustered around 4k-6k range of most-used ids.

4.2 Bike Share¶

  • Variable Type: Qualitative/Categorical
  • Appropariate Plot: Bar Chart and Pie Chart/Waffle Plot

Insights:¶

  • Total number of bikes that weren't shared during the trip is 166,053 and represents 91% of total rides.
  • Total number of bikes that were shared during the trip is 17,359 and represents 9% of total rides.

5.User Characteristics¶

5.1 User Type¶

  • Variable Type: Qualitative/Categorical
  • Appropariate Plot: Bar Chart and Pie Chart/Waffle Plot

Insights:¶

  • Total number of subscribers is 163,544 and represents 89% of total users.
  • Total number of customers is 19,868 and represents 11% of total users.

5.2 Member Gender¶

  • Variable Type: Qualitative/Categorical
  • Appropariate Plot: Bar Chart and Pie Chart/Waffle Plot

Insights:¶

  • Total number of male users is 130,651 and represents 71% of total users.
  • Total number of female users is 40,844 and represents 22% of total users.
  • Total number of nondefined users is 8,265 and represents 5% of total users.
  • Total number of other users is 3,652 and represents 2% of total users.

5.3 Member Age¶

  • Variable Type: Quantitative/Numeric
  • Appropariate Plot: Histogram

Insights:¶

  • The member age distribution follows a skewed distribution to the right. Scaling the x-axis didn't change the skewed distribution so transforming x-axis may normally distribute data as follows.

Insights:¶

  • Average age of members is 32 years old and 75% of ages is under 38 years old and 99% of ages is under 63 years old.
  • Member ages follow a skewed distribution to the right with most ages are clustered between 18-40 years old in non-scaled and scaled distributions.
  • However, transforming the years' scale showed that most ages are clustered between 23-40 years old.
  • To get better understanding of age distribution, I think removing age outliers and 0 age value, which we have previously imputed and filled up missing values in 'member_age' attribute with it, will precisely improve the age distribution as follows.

Insights:¶

  • 99% of ages are below 63 years old with total number of records of 181,598.
  • 1% of ages are above 63 years old with total number of records of 1,814.
  • Ages equal to 0 years old or below 18 years old is 8,265.
  • Total net ages from 18-63 years old is 173,333.
  • Both scaled and non-scaled age distribution showed that data is skewed to the right with most ages clustered between 23-39 years old. -Transforming the age scale confirmed very close findings: age are clustered between 23-41 years old with 2 peaks, one from 23-31 and the other from 32-41 years old.
  • This leads me to the idea to create member age groupings to compare the distribution of data among these age groups.

Insights:¶

  • Age groups 20-30 years old is the highest group in terms of rides with a proportion of 40%, followed by 30-40 years old group with a proportion of 36.5%.
  • The lowest age group is 70-141 years old with a proportion of 0.3%

Distributions of variables of interestand types of transformations needed:¶

I have found that the follwing variables have skewness and long-normal distribution and need x-axis or y-axis transformation:

Duration Variables:¶

  1. 'duration_sec' variable: Durations/second is highly skewed and clustered to the left in a long-normal distribution and most of the data falls below 10k seconds for more than 175k of the rides.
  2. 'duration_minute' variable: Duration/minute follows the same distribution of duration/second as it has been actually extracted from it. It has a similar long-normal distribution and skewness to the right. Most of rides (175K) has <150 minutes in duration.
  3. 'duration_hour' variable: Duration/hour follows the same distribution of duration/second as it has been actually extracted from it. It has a similar long-normal distribution and skewness to the right. Most of rides (175K) has <2.5 hours in duration.

Start and End Time Variables:¶

  1. 'start_time_hour' & 'end_time_hour' variables: Start time hour and end time hour follow the same bimodel distribution with a very small skew to the left.
  2. 'start_time_day' & 'end_time_day' variables: Start time hours and end time hours follow the same bimodel distribution with a skewness to the left. ### Ride Distance(km) Variable:
  3. 'distance_km' variable: Ride distance distibution is highly skewed to the left with a log-normal distribution as >175k of the rides have a distance ~8km. ### Member Age variable:
  4. 'member_age' variable: The member age distribution follows a skewed distribution to the right. Scaling the x-axis didn't change the skewed distribution so transforming x-axis may normally distribute data as follows.

Features with unusual distributions and operations/transformations applied:¶

I have performed many transformations for some x-axis or y-axis scales with skewness and long-normal distribution in order to explore data distribution deeply. I can summarize these transformations as follows:

Duration Variables:¶

  1. 'duration_sec' variable: I have changed x-axis scale limit and performed a log-type transformation to x-axis. Changing the x-axis limit shows that data is clustered in position between 7.5k-10k seconds and transforming the x-axis shows that data is normally distributed with a little bit skewnees to the right with a mean of 726 seconds.
  2. 'duration_minute' variable: I have changed x-axis scale limit and performed a log-type transformation to x-axis. Scaling the x-axis showed up that most of the data is actually custered in the same position > 150 minutes and transforming the x-axis shows that data is normally distributed with a mean of 12 minutes.
  3. 'duration_hour' variable: I have changed x-axis scale limit and performed a log-type transformation to x-axis. Scaling the x-axis showed up thay data actually is custered in the same position > 2.5 hours and transforming the x-axis shows that data is highly skewed to the right with a mean of 2 hours.

Start and End Time Variables:¶

  1. 'start_time_hour' & 'end_time_hour' variables: I have performed a log-type transformation to x-axis and transforming start and end hours cionfirms that they follow the same bimodel distribution skewed to the left. 50% of rides start before 2 PM and mean start hour is 1.45 PM and 50% of rides ends before 2 PM and mean end hour is 1.6 PM.
  2. 'start_time_day' & 'end_time_day' variables: Start time hours and end time hours follow the same bimodel distribution with a skewness to the left. Transforming start and end time days confirms that they follow the same bimodel distribution skewed to the left and 50% of start and end time rides occured before day 15.

Ride Distance(km) Variable:¶

  1. 'distance_km' variable: I have tried to change the scale limit of the distance to better visualize data. However, the data is highly clustered in the same position of ~7km with some outliers. I have performed a log-type transformation to x-axis and transforming the x-axis confirms that we have a bimodel distribution with two peaks: One from 1-1.5km and 1.5-2.6km.

Member Age variable:¶

  1. 'member_age' variable: Member ages follow a skewed distribution to the right with most ages are clustered between 18-40 years old in non-scaled and scaled distributions. I have performed a log-type transformation to x-axis and transforming the years' scale showed that most ages are clustered between 23-40 years old. However, after removinf age outliers, Both scaled and non-scaled age distribution showed that data is skewed to the right with most ages clustered between 23-39 years old and transforming the age scale confirmed very close findings: age are clustered between 23-41 years old with 2 peaks, one from 23-31 and the other from 32-41 years old.

Bivariate Exploration¶

In this section, I will investigate and plot the correlations, patterns, trends, models, and relationships between a couple of variables in the dataset, two features at a time.

1. Categorical Variable vs. Categorical Variable¶

We can plot the following categorical 9 relationships for the major categorical variables of user type, member gender, bike share status, amd ride start and end stations:

  1. User Type vs. Member Gender & User Type vs. Bike Share (2 plots)
  2. User Type vs. Age Groups (1 plot)
  3. User Type vs. Ride Start and End Stations (2 plots)
  4. Member Gender vs. bike share (1 plots)
  5. Member Gender vs. Age Groups (1 plot)
  6. Member Gender vs. Ride Stations (2 plots)
  7. Bike Share vs. Age Groups (1 plot)
  8. Bike Share vs. Ride Stations (2 plots)

1.1 User Type vs. Member Gender & User Type vs. Bike Share (2 plots)¶

Insights:¶

  • Most subscribers are males and represents 73% of total subscribers, followed by females with 22%.
  • Most customers are males and represents 58% customers of customers, followed by females with 23%.
  • 89.4% of subscribers don't share bikes during their rides and 10.6 % of subscribers share bikes during their rides.
  • All customers don't share bikes during their rides.

1.2 User Type vs. Age Groups (1 plot)¶

Insights:¶

  • For subscribers, age group 20-30 is the highest (39.7%), followed by age group 30-40 (39.4%) and the lowest age group is 70-141(0.001%).
  • For customers, age group 20-30 is the highest (40%), followed by age group 30-40 (36.3%) and the lowest age group is 70-141(0.004%).

1.3 User Type vs. Start and End Stations (2 plots)¶

Insights:¶

  • Subscribers are using both start and end stations more than customers.

1.4 Member Gender vs. bike share (1 plots)¶

Insights:¶

  • 90% of males don't share bikes during the ride and 10% share bikes.
  • 91% of females don't share bikes during the ride and 9% share bikes.
  • All not-defined users don't share bikes during the ride.
  • 82% of other users don't share bikes during the ride and 18% share bikes.

1.5 Member Gender vs. Age Groups (1 plot)¶

Insights:¶

  • For males and female genders, age group 20-30 is the highest (39%), followed by age group 30-40 (36.3%) and the lowest age group is 70-141 (0.004%).
  • For females and female genders, age group 20-30 is the highest (44%), followed by age group 30-40 (36%) and the lowest age group is 70-141 (0.002%).
  • For other other gender, age group 30-40 (43.3%) is the highest, followed by age group 20-30 (33%) and the lowest age group is 70-141 (0.0025%).

1.6 Member Gender vs. Ride Stations (2 plots)¶

Insights:¶

  • Males are using both start and end stations more than other genders.

1.7 Bike Share vs. Age Groups (1 plot)¶

Insights:¶

  • For non-shared rides , age group 30-40 is the highest (39%), followed by age group 20-30 (38%) and the lowest age group is 70-141(0.0024%).
  • For shared rides, age group 20-30 is the highest (58%), followed by age group 30-40 (13%) and the lowest age group is 70-141(0.01%).

1.8 Bike Share vs. Ride Stations (2 plots)¶

Insights:¶

  • Non-shared bikes are more than shared bikes in both start and end stations.

2. Numeric Variable vs. Numeric Variable¶

We can plot many numberic variable relationships for major numeric variables of member age, ride start and end times, ride durations, and ride distances as follows:

  1. Member Age vs. Ride Duration in Seconds, Minutes, Hours (3 Plots)
  2. Member Age vs. Ride Start Time Hour and End Time Hour (2 Plots)
  3. Member Age vs. Ride Start Time Day and End Time Day(2 Plots)
  4. Member Age vs. Ride Distance (1 Plots)
  5. Member Age vs. Bike Id (1 Plots)
  6. Ride Start and End Time Hour vs. Ride Duration in Seconds, Minutes, and Hours (6 Plots)
  7. Ride Start and End Time Day vs. Ride Duration in Seconds, Minutes, and Hours (6 Plots)
  8. Ride Start and End Time Hour vs. Ride distance (2 Plots)
  9. Ride Start and End Time Day vs. Ride distance (2 Plots)
  10. Ride Start and End Time Hour vs. Bike Id (2 Plots)
  11. Ride Start and End Time Day vs. Bike Id (2 Plots)
  12. Ride Duration in Seconds, Minutes, and Hours vs. Bike Id (3 Plots)
  13. Ride Duration in Seconds, Minutes, and Hours vs. Ride Distance (3 Plots)
  14. Ride Distance vs. Bike Id (1 Plots)

2.1 Member Age vs. Ride Duration in Seconds, Minutes, Hours (3 Plots)¶

Insights:¶

  • There is a negative coorelation between member age and ride durations: Duration decreases when age increases.
  • Ages 23-40 have the higher ride durations compared to other ages.

2.2 Member Age vs. Ride Start Time Hour and End Time Hour (2 Plots)¶

Insights:¶

  • Ages share the same distribution in both start and end time hours.
  • There is a negative coorelation between member age and start and end time hour: start and end time hours decrease as ages increase: -Most start and end time hours are 7-9 AM and 4-6 PM.
  • Ages 18-40 have the higher number of rides that start and end mostly at 7-9 AM and 4-6 PM.

2.3 Member Age vs. Ride Start Time Day and End Time Day (2 Plots)¶

Insights:¶

  • Similar to start and end time days, ages share the same distribution in both start and end time hours.
  • There is a negative coorelation between member age and start and end time day: as age increases, start and end time hour and day decrease.
  • Although the number of rides decreases as age increases, all ages start and end mostly at 4-9, 12-17 and 19-28.
  • Ages 18-40 have the higher number of rides that start and end mostly at most start and end time days.

2.4 Member Age vs. Ride Distance (2 Plots)¶

Insights:¶

  • 99.9% of distance(km) is below 8km and distance(km) ranges from 0.17km-8km.
  • In general, there is a very weak positive coorelation between distance(km) and member age.

2.5 Member Age vs. Bike Id (2 Plots)¶

Insights:¶

  • There is a negative coorelation between the member age and bike ids used: as age increases, the range of bike ids used decreases.

2.6 Ride Start and End Time Hour vs. Ride Duration in Seconds, Minutes, and Hours (6 Plots)¶

Insights:¶

  • There is a very weak positive coorelation between start time and end hour and durations: as start time and end hours increase, durations increases.

2.7 Ride Start and End Time Day vs. Ride Duration in Seconds, Minutes, and Hours (6 Plots)¶

Insights:¶

  • There is a very weak positive coorelation between start time and end day and durations: as start time and end day increase, durations increases.

2.8 Ride Start and End Time Hour vs. Ride distance (2 Plots)¶

C:\Users\sherif\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(

Insights:¶

  • There is a very weak negative coorelation between start and end time hour and distance(km): as start and end time hour increases, slightly the distance(km) decreases.

2.9 Ride Start and End Time Day vs. Ride distance (2 Plots)¶

Insights:¶

  • There is a a very weak positive coorelation between start and end time day and distance(km): as start and end time day increases, slightly the distance(km) increases.

2.10 Ride Start and End Time Hour vs. Bike Id (2 Plots)¶

Insights:¶

  • There is a positive coorelation between start and end time hour and range of bike ids used: as start and end time hours increase, the range of bike ids used increases from 4k-above, with most used bikes ranges from 4.5k-5.3k.

2.11 Ride Start and End Time Day vs. Bike Id (2 Plots)¶

Insights:¶

  • There is a positive coorelation between start and end time day and range of bike ids used: as start and end time days increase, the range of bike ids used increases from 4k-above, with most used bikes ranges from 4.5k-5.3k.

2.12 Ride Duration in Seconds, Minutes, and Hours vs. Bike Id (3 Plots)¶

Insights:¶

  • There is a a weak negative coorelation between duration in seconds, minutes, and hours and bike ids used: as duration in seconds, minutes, and hours increase, the number of bike ids decreases.
  • Most bike ids are used in a duration below 10k seconds, in a duration below 200 minutes, and in a duration below 5 hours.

2.13 Ride Duration in Seconds, Minutes, and Hours vs. Ride Distance (3 Plots)¶

Insights:¶

  • There is a negative coorelation between duration in seconds, minutes, and hours and distance(km): as duration in seconds, minutes, and hours increase, distance(km) decreases.
  • Most distances are traveled in a duration below 20k seconds, most distances are traveled in a duration below 200 minutes, and most distances are traveled in a duration below 5 hours.

2.14 Ride Distance vs. Bike Id (1 Plots)¶

Insights:¶

  • There is a very week negative correlation between distance(km) and bike ids used: as distance increase, bike ides decreases.
  • Most ride ids are used in a distance between 0.2km-below 8km.

3. Categorical Variable vs. Numeric Variable¶

We can plot the following 58 categorical vs. numeric relationships for the major categorical variables of user type, member gender, bike share status, age groups, weekdays, and ride start and end stations, compared to the major numeric variables of member ages, duration in seconds, minutes and hours, ride start and end time hour and day, bike id and ride distance.

Categorical Variable vs. Numeric Variable include:

  1. User Type vs. Member Age (1 Plot)
  2. User Type vs. Duration in Seconds, Minutes, and Hours (3 Plots)
  3. User Type vs. Ride start and End Time Hour and Day(4 Plots)
  4. User Type vs. Bike Id and Ride Distance (2 Plot)
  5. Member Gender vs. Member Age (1 Plot)
  6. Member Gender vs. Duration in Seconds, Minutes, and Hours (3 Plots)
  7. Member Gender vs. Ride Start and End Time Hour and Day (4 Plots)
  8. Member Gender vs. Bike Id and Ride Distance (2 Plot)
  9. Bike Share Status vs. Member Age (1 Plot)
  10. Bike Share Status vs. Duration in Seconds, Minutes, and Hours (3 Plots)
  11. Bike Share Status vs. Ride Start and End Time Hour and Day (4 Plots)
  12. Bike Share Status vs. Bike Id and Ride Distance (2 Plot)
  13. Ride Stations vs. Member Gender (1 Plot)
  14. Ride Stations vs. Duration in Seconds, Minutes, and Hours (3 Plots)
  15. Ride stations vs. Ride Start and End Time Hour and Day (4 Plots)
  16. Ride stations vs. Bike Id and Ride Distance (2 Plot)
  17. Weekday vs. Member Age (1 Plot)
  18. Weekday vs. Duration in Seconds, Minutes, and Hours (3 Plots)
  19. Weekday vs. Ride start and End Time Hour and Day (4 Plots)
  20. Weekday vs. Bike Id and Ride Distance (2 Plot)
  21. Age groups vs. Duration in Seconds, Minutes, and Hours (3 Plots)
  22. Age groups vs. Ride start and End Time Hour and Day (4 Plots)
  23. Age groups vs. Bike Id and Ride Distance (2 Plot)

3.1 User Type vs. Member Age (2 Plot)¶

Insights:¶

  • Average age for both customer and subscriber is 34 years old.
  • 25% of ages for both customer and subscriber are below 27 years old.
  • Median age for both customer and subscriber is 32 years old.
  • 75% of ages/customer are below 38 years old and for subscriber are below 39 years old.
  • Max age/customer is 141 years old and for subscriber is 119 years old.
  • Member age per user type follow a similar distribution.
  • In general, customer statistics are as typical as those of subscriber, except for the 75% statistic which is higher for subscriber than that of customer but the max statistic for customers is higher than that of subscribers.

3.2 User Type vs. Duration in Seconds, Minutes, and Hours (3 Plots)¶

Insights:¶

  • Average duration(second) for customer is 1,432 seconds and for subscriber is 640 seconds.
  • Average duration(minute) for customer is 24 minutes and for subscriber is 11 minutes.
  • Average duration(hour) for customer is 0.4 hour and for subscriber is 0.18 hour.
  • In general, average durations/customers are higher than those of subscribers.

3.3 User Type vs. Ride start and End Time Hour and Day(4 Plots)¶

Insights:¶

  • Average start time (hour) for customer is 1.6 PM and for subscriber is 1.3 PM.
  • Average end time (hour) for customer is 1.9 PM and for subscriber is 1.6 PM.
  • Average start and end time(day) for customer is 16 and for subscriber is 15.
  • Start and end time hour for customers is 10 AM and for subscribers is 9 AM.
  • Start and end time hour for customers is 9 AM and for subscribers is 8 AM.
  • In general, start and end time hour and day for subscribers are earlier than thos of customers.

3.4 User Type vs. Bike Id and Ride Distance (4 Plot)¶

Insights:¶

  • Average start distance(km) for customer is 1.9km and for subscriber is 1.7km.
  • Average most used bike id for customer is 4,226 and for subscriber is 4503.
  • In general, average distance(km) traveled by and bike id used by customers are higher than those of subscribers.

3.5 Member Gender vs. Member Age (2 Plot)¶

Insights:¶

  • Average age of female members is 33 years old.
  • Average age of male members is 34 years old.
  • Average age of other members is 36 years old.
  • In general, average age of other members is higher than those of subscribers and male age is higher than female age.

3.6 Member Gender vs. Duration in Seconds, Minutes, and Hours (3 Plots)¶

Insights:¶

  • Average duration (second) for female is 779 seconds, for male is 673 seconds, for not-defined is 1,189 seconds, and for other is 997 seconds.
  • Average duration (minutes) for female is 13 minutes, for male is 11 minutes, for not-defined is 20 minutes, and for other is 17 minutes.
  • Average duration (hour) for female is 0.22 hour, for male is 0.19 hour, for not-defined is 0.33 hours, and for other is 0.028 hours.
  • In general, average durations spent by not-defined and other genders are higher than those of male and female genders, but females spent more duration than males.

3.7 Member Gender vs. Ride Start and End Time Hour and Day (4 Plots)¶

Insights:¶

  • Average start time(hour) for female is 1.2 PM, for male is 1.5 PM, for not-defined is 1.5 PM, and for other is 1.7 PM.
  • Average end time(hour) for female is 1.4 PM, for male is 1.7 PM, for not-defined is 1.7 PM, and for other is 1.8 PM.
  • Average start and end time(day) for female is 15, for male is 15, for not-defined is 15, and for other is 15.
  • In general, average start and end day per all genders is the same but average start and end time hour per female is earlier than males and all other genders.
  • 25% of all genders' start time hour is below 9 AM and 75% start time hour is below 5 PM. 25% all genders' end time hour is below 9 AM, except for other gender (10 AM), and 75% end time hour is below 5 PM, except for male gender (6 PM).
  • 25% of all genders' start and end time day is below day 8 of the month and 75% start time day is below day 22 of the month.

3.8 Member Gender vs. Bike Id and Ride Distance (4 Plot)¶

Insights:¶

  • Average distance(km) for female 1.8km, for male is 1.7km, for not-defined is 1.7km, and for other is 1.8km.
  • Average used bike id for female is 4,397, for male is 4,507, for not-defined is 4,275, and for other is 4543.
  • In general, average distance(km) traveled by females and others are higher than those of male and not-defined genders. Average used ids ranges from 4,275 to 4543.

3.9 Bike Share Status vs. Member Age (3 Plot)¶

Insights:¶

  • Average member age that share bikes during the ride is 34 years old.
  • Average member age that doesn't share bikes during the ride is 32 years old.

3.10 Bike Share Status vs. Duration in Seconds, Minutes, and Hours (3 Plots)¶

Insights:¶

  • Average duration(second) for non-shared bike rides is 730 seconds and for shared rides is 684 seconds.
  • Average duration(minute) for non-shared bike rides is 12 minutes and for shared rides is 11 minutes.
  • Average duration(hour) for non-shared bike rides is 0.2 hours and for shared rides is 0.19 hour.
  • In general, average durations for non-shared bike rides are higher than that of shared bike rides.

3.11 Bike Share Status vs. Ride Start and End Time Hour and Day (4 Plots)¶

Insights:¶

  • Average start time(hour) for non-shared bike rides is 1.4 PM and for shared rides is 2.1 PM.
  • Average end time (hour) for non-shared bike rides is 1.5 PM and for shared rides is 2.2 PM.
  • Average start and end time(day) for both non-shared or shared bike rides is 3.3 PM.
  • In general, start and end time(hour for non-shared bike rides is earlier than that of shared bike rides, while the average of start and end time(day) is same (3.3 PM).

3.12 Bike Share Status vs. Bike Id and Ride Distance (4 Plot)¶

Insights:¶

  • Average distance(km) for non-shared bike rides is 1.7km and for shared bike rides is 1.3km. Average distance shared is lower than that non-shared.
  • Average used bike id for shared bike rides is 4483 and for shared bike rides is 4379.

3.13 Ride Stations vs. Member Age (3 Plots)¶

Insights:¶

  • Average member age for San Fransisco 's Sations is 33 years old, for Oakland_Berkeley 's Sations is 34 years old, for San Jose 's Sations is 31 years old.

3.14 Ride Stations vs. Duration in Seconds, Minutes, and Hours (9 Plots)¶

Insights:¶

  • Average duration (second) for San Fransisco 's sations is 812.6 seconds, for Oakland_Berkeley 's stations is 747.5 seconds, and for San Jose's sations is 752.6 seconds.
  • Average duration (minute) for San Fransisco 's sations is 13.5 minutes, for Oakland_Berkeley 's stations is 12.5 minutes, and for San Jose's sations is 12.5 minutes.
  • Average duration (hour) for San Fransisco 's sations is 0.23 hour, for Oakland_Berkeley 's stations is 0.21 hour, and for San Jose's sations is 0.21 hour. In general, San Fransisco 's sations have higher durations than both Oakland_Berkeley 's stations and San Jose's sations.

3.15 Ride stations vs. Ride Start Time Hour and Day (6 Plots)¶

Insights:¶

  • Average start time(hour) for San Fransisco 's sations is 1.1 PM, for Oakland_Berkeley 's stations is 1.05 PM, and for San Jose's sations is 2.2 PM.
  • Average start time(day) for San Fransisco 's sations is 16, for Oakland_Berkeley 's stations is 15, and for San Jose's sations is 16. In general, Oakland_Berkeley 's stations have earlier start time hour and day than San Fransisco 's sations or San Jose 's sations.

3.16 Ride stations vs. Bike Id and Ride Distance (6 Plots)¶

Insights:¶

  • Average distance(km) for San Fransisco 's sations is 1.9km, for Oakland_Berkeley 's stations is 1.7km, and for San Jose's sations is 1.7km.
  • Average bike id used for San Fransisco 's sations is 4651, for Oakland_Berkeley 's stations is 4104, and for San Jose's sations is 3764. -In general, San Fransisco 's stations have higher distances than Oakland_Berkeley 's sations or San Jose 's sations.

3.17 Weekday vs. Member Age (4 Plots)¶

Insights:¶

  • Average age per both start and end time weekday is 34 years old, except for Saturday and Sunday is 33 years old, and for Thursday is 35 years old.

3.18 Weekday vs. Duration in Seconds, Minutes, and Hours (3 Plots)¶

Insights:¶

  • Average durations for week days (Saturday and Sunday) are higher than working days.
  • Average duration(second) for working days(Monday-Friday) ranges from 663 to 713 seconds while weekend days has an average which between 903 and 920 seconds.
  • Average duration(minute) for working days(Monday-Friday) ranges from 11.1 to 11.9 minuts while weekend days has an average which between 15 and 15.3 minuts.
  • Average duration(hour) for working days(Monday-Friday) ranges from 0.18 to 0.2 hour while weekend days has an average which between 0.25 and 0.26 hour.

3.19 Weekday vs. Ride start and End Time Hour and Day (4 Plots)¶

Insights:¶

  • Average start time (hour) for working days(Monday-Friday) ranges from 12.8 PM to 1.7 PM while weekend days has an average which between 1.7 to 2.2 PM.
  • Average end time (hour) for working days(Monday-Friday) ranges from 1 PM to 1.9 PM while weekend days has an average which between 1.8 PM to 2.4 PM.
  • Average start and end time (day)for working days(Monday-Friday) ranges from 13-17 while weekend days has an average which between 14-15
  • In general, working days start and end time hours and days are earlier than those of weekend days.

3.20 Weekday vs. Bike Id and Ride Distance (2 Plot)¶

Insights:¶

  • Average distance(km) for start and end time working days(Monday-Friday) ranges from 1.67km to 1.73km while weekend days has an average of 1.6km. -Average bike id most used for start and end time working days(Monday-Friday) ranges from 4380 to 4518 while weekend days has an average between 4575 to 4628
  • In general, Average distance(km) for working days in start and end time working weekdays is higher than those of weekend days. And, the range of used bike ids in weekend days is higher than this of working days.

3.21 Age groups vs. Duration in Seconds, Minutes, and Hours (3 Plots)¶

Insights:¶

  • Average duration for age group 60-70 is the highest (0.21 hour), followed by group 50-60 (0.204 hour), followed by group 18-20 (0.201 hour), followed by groups 20-50 (0.191-0.198 hour), and the lowest average duration is for group 70-141 (0.174 hour).

3.22 Age groups vs. Ride start and End Time Hour and Day (4 Plots)¶

Insights:¶

  • Average start time(hour) for age groups 60-70 & 70-141 is the earliest (12.82 PM-12.87 PM), followed by group 20-30 to 50-60 (1 PM - 1.7 PM), and the most late average start time(hour) is for group 0-20 (2.4 PM).
  • Average end time(hour) for age groups 60-70 & 70-141 is the earliest (1 PM- 1.06 PM), followed by group 20-30 to 50-60 (1.2 PM - 1.9 PM), and the most late average end time(hour) is for group 0-20 (2.5 PM).
  • Average start and end time(day) for age group 0-20 is the earliest (day 14), followed by group 20-30 to 60-70 (day 15), and the most late average start and end time(day) is for group 70-141 (day 16).

3.23 Age groups vs. Bike Id and Ride Distance (2 Plot)¶

Insights:¶

  • Average distance(km) for age group 30-40 is the highest(1.8km), followed by age groups 20-30 and 40-70 (1.6km-1.7km), and the lowest average distance(km) is for age groups 0-20 & 70-141(1.3km-1.5km).
  • The higher average bike id most used is for age groups 20-40 (4517-4520), followed by age groups 0-20, 40-50, 50-60 & 70-141 (4331-4409), and the lowest average goes for the age group 60-70(4090).

Summary of Multivariate Relationships between Variables:¶

Main features of interest include ride durations, ride times, ride weekdays, ride distance(km), ride stations/city, user types, member genders, member age, and age groups. I will investigate the relationships among these variables as follows:

1. Categorical Variable vs. Categorical Variable¶

User Type vs. Member Gender:¶

  • Most subscribers are males and represents 73% of total subscribers, followed by females with 22%.
  • Most customers are males and represents 58% of total customers, followed by females with 23%. #### User Type vs. Age Groups:
  • For subscribers, age group 20-30 is the highest (39.7%), followed by age group 30-40 (39.4%) and the lowest age group is 70-141(0.001%).
  • For customers, age group 20-30 is the highest (40%), followed by age group 30-40 (36.3%) and the lowest age group is 70-141(0.004%). #### User Type vs. Start and End Stations:
  • Subscribers are using both start and end stations more than customers.

Member Gender vs. Age Groups:¶

  • For males and female genders, age group 20-30 is the highest (39%), followed by age group 30-40 (36.3%) and the lowest age group is 70-141 (0.004%).
  • For females and female genders, age group 20-30 is the highest (44%), followed by age group 30-40 (36%) and the lowest age group is 70-141 (0.002%).
  • For other other gender, age group 30-40 (43.3%) is the highest, followed by age group 20-30 (33%) and the lowest age group is 70-141 (0.0025%). #### Member Gender vs. Ride Stations:
  • Males are using both start and end stations more than other genders.

2. Numeric Variable vs. Numeric Variable:¶

Member Age vs. Ride Duration in Seconds, Minutes, Hours:¶

  • There is a negative coorelation between member age and ride durations: Duration decreases when age increases.
  • Ages 23-40 have the higher ride durations compared to other ages. #### Member Age vs. Ride Start Time Hour and End Time Hour:
  • There is a negative coorelation between member age and start and end time hour: start and end time hours decrease as ages increase.
  • Most start and end time hours are 7-9 AM and 4-6 PM.
  • Ages 18-40 have the higher number of rides that start and end mostly at 7-9 AM and 4-6 PM.

Member Age vs. Ride Start Time Day and End Time Day:¶

  • There is a negative coorelation between member age and start and end time day: as age increases, start and end time hour and day decrease.
  • Although the number of rides decreases as age increases, all ages start and end mostly at 4-9, 12-17 and 19-28.
  • Ages 18-40 have the higher number of rides that start and end mostly at most start and end time days. #### Member Age vs. Ride Distance:
  • 99.9% of distance(km) is below 8km and distance(km) ranges from 0.17km-8km.
  • In general, there is a very weak positive coorelation between distance(km) and member age. #### Ride Start and End Time Hour vs. Ride Duration in Seconds, Minutes, and Hours:
  • There is a very weak positive coorelation between start time and end hour and durations: as start time and end hours increase, durations increases.

Ride Start and End Time Day vs. Ride Duration in Seconds, Minutes, and Hours:¶

  • There is a very weak positive coorelation between start time and end day and durations: as start time and end day increase, durations increases. #### Ride Start and End Time Hour vs. Ride distance:
  • There is a very weak negative coorelation between start and end time hour and distance(km): as start and end time hour increases, slightly the distance(km) decreases. #### Ride Start and End Time Day vs. Ride distance:
  • There is a a very weak positive coorelation between start and end time day and distance(km): as start and end time day increases, slightly the distance(km) increases.

Ride Duration in Seconds, Minutes, and Hours vs. Ride Distance:¶

  • There is a negative coorelation between duration in seconds, minutes, and hours and distance(km): as duration in seconds, minutes, and hours increase, distance(km) decreases.
  • Most distances are traveled in a duration below 20k seconds, most distances are traveled in a duration below 200 minutes, and most distances are traveled in a duration below 5 hours.

3. Categorical Variable vs. Numeric Variable :¶

User Type vs. Member Age:¶

  • Average age for both customer and subscriber is 34 years old.
  • 25% of ages for both customer and subscriber are below 27 years old.
  • Median age for both customer and subscriber is 32 years old.
  • 75% of ages/customer are below 38 years old and for subscriber are below 39 years old.
  • Max age/customer is 141 years old and for subscriber is 119 years old. #### User Type vs. Duration in Seconds, Minutes, and Hours:
  • Average duration(second) for customer is 1,432 seconds and for subscriber is 640 seconds.
  • Average duration(minute) for customer is 24 minutes and for subscriber is 11 minutes.
  • Average duration(hour) for customer is 0.4 hour and for subscriber is 0.18 hour.
  • In general, average durations/customers are higher than those of subscribers.

User Type vs. Ride start and End Time Hour and Day:¶

  • Average start time (hour) for customer is 1.6 PM and for subscriber is 1.3 PM.
  • Average end time (hour) for customer is 1.9 PM and for subscriber is 1.6 PM.
  • Average start and end time(day) for customer is 16 and for subscriber is 15.
  • Start and end time hour for customers is 10 AM and for subscribers is 9 AM.
  • Start and end time hour for customers is 9 AM and for subscribers is 8 AM.
  • In general, start and end time hour and day for subscribers are earlier than thos of customers. #### User Type vs. Bike Id and Ride Distance:
  • Average start distance(km) for customer is 1.9km and for subscriber is 1.7km.
  • Average most used bike id for customer is 4,226 and for subscriber is 4503.
  • In general, average distance(km) traveled by and bike id used by customers are higher than those of subscribers.

Member Gender vs. Member Age:¶

  • Average age of female members is 33 years old.
  • Average age of male members is 34 years old.
  • Average age of other members is 36 years old.
  • In general, average age of other members is higher than those of subscribers and male age is higher than female age. #### Member Gender vs. Duration in Seconds, Minutes, and Hours:
  • Average duration (second) for female is 779 seconds, for male is 673 seconds, for not-defined is 1,189 seconds, and for other is 997 seconds.
  • Average duration (minutes) for female is 13 minutes, for male is 11 minutes, for not-defined is 20 minutes, and for other is 17 minutes.
  • Average duration (hour) for female is 0.22 hour, for male is 0.19 hour, for not-defined is 0.33 hours, and for other is 0.028 hours.
  • In general, average durations spent by not-defined and other genders are higher than those of male and female genders, but females spent more duration than males.

Member Gender vs. Ride Start and End Time Hour and Day:¶

  • Average start time(hour) for female is 1.2 PM, for male is 1.5 PM, for not-defined is 1.5 PM, and for other is 1.7 PM.
  • Average end time(hour) for female is 1.4 PM, for male is 1.7 PM, for not-defined is 1.7 PM, and for other is 1.8 PM.
  • Average start and end time(day) for female is 15, for male is 15, for not-defined is 15, and for other is 15.
  • In general, average start and end day per all genders is the same but average start and end time hour per female is earlier than males and all other genders.
  • 25% of all genders' start time hour is below 9 AM and 75% start time hour is below 5 PM. 25% all genders' end time hour is below 9 AM, except for other gender (10 AM), and 75% end time hour is below 5 PM, except for male gender (6 PM).
  • 25% of all genders' start and end time day is below day 8 of the month and 75% start time day is below day 22 of the month. #### Member Gender vs. Ride Distance:
  • Average distance(km) for female 1.8km, for male is 1.7km, for not-defined is 1.7km, and for other is 1.8km.
  • In general, average distance(km) traveled by females and others are higher than those of male and not-defined genders.

Ride Stations vs. Member Age:¶

  • Average member age for San Fransisco 's Sations is 33 years old, for Oakland_Berkeley 's Sations is 34 years old, for San Jose 's Sations is 31 years old. #### Ride Stations vs. Duration in Seconds, Minutes, and Hours:
  • Average duration (second) for San Fransisco 's sations is 812.6 seconds, for Oakland_Berkeley 's stations is 747.5 seconds, and for San Jose's sations is 752.6 seconds.
  • Average duration (minute) for San Fransisco 's sations is 13.5 minutes, for Oakland_Berkeley 's stations is 12.5 minutes, and for San Jose's sations is 12.5 minutes.
  • Average duration (hour) for San Fransisco 's sations is 0.23 hour, for Oakland_Berkeley 's stations is 0.21 hour, and for San Jose's sations is 0.21 hour.
  • In general, San Fransisco 's sations have higher durations than both Oakland_Berkeley 's stations and San Jose's sations.

Ride stations vs. Ride Start Time Hour and Day:¶

  • Average start time(hour) for San Fransisco 's sations is 1.1 PM, for Oakland_Berkeley 's stations is 1.05 PM, and for San Jose's sations is 2.2 PM.
  • Average start time(day) for San Fransisco 's sations is 16, for Oakland_Berkeley 's stations is 15, and for San Jose's sations is 16.
  • In general, Oakland_Berkeley 's stations have earlier start time hour and day than San Fransisco 's sations or San Jose 's sations. #### Ride stations vs. Bike Id and Ride Distance:
  • Average distance(km) for San Fransisco 's sations is 1.9km, for Oakland_Berkeley 's stations is 1.7km, and for San Jose's sations is 1.7km.
  • Average bike id used for San Fransisco 's sations is 4651, for Oakland_Berkeley 's stations is 4104, and for San Jose's sations is 3764. -In general, San Fransisco 's stations have higher distances than Oakland_Berkeley 's sations or San Jose 's sations.

Weekday vs. Member Age:¶

  • Average age per both start and end time weekday is 34 years old, except for Saturday and Sunday is 33 years old, and for Thursday is 35 years old. #### Weekday vs. Duration in Seconds, Minutes, and Hours:
  • Average durations for week days (Saturday and Sunday) are higher than working days.
  • Average duration(second) for working days(Monday-Friday) ranges from 663 to 713 seconds while weekend days has an average which between 903 and 920 seconds.
  • Average duration(minute) for working days(Monday-Friday) ranges from 11.1 to 11.9 minuts while weekend days has an average which between 15 and 15.3 minuts.
  • Average duration(hour) for working days(Monday-Friday) ranges from 0.18 to 0.2 hour while weekend days has an average which between 0.25 and 0.26 hour.

Weekday vs. Ride start and End Time Hour and Day:¶

  • Average durations for week days (Saturday and Sunday) are higher than other working days.
  • Average start time (hour) for working days(Monday-Friday) ranges from 12.8 PM to 1.7 PM while weekend days has an average which between 1.7 to 2.2 PM.
  • Average end time (hour) for working days(Monday-Friday) ranges from 1 PM to 1.9 PM while weekend days has an average which between 1.8 PM to 2.4 PM.
  • Average start and end time (day)for working days(Monday-Friday) ranges from 13-17 while weekend days has an average which between 14-15
  • In general, working days start and end time hours and days are earlier than those of weekend days.

Weekday vs. Bike Id and Ride Distance :¶

  • Average distance(km) for start and end time working days(Monday-Friday) ranges from 1.67km to 1.73km while weekend days has an average of 1.6km.
  • In general, average distance(km) for working days in start and end time working weekdays is higher than those of weekend days. #### Age groups vs. Duration in Seconds, Minutes, and Hours:
  • Average duration for age group 60-70 is the highest (0.21 hour), followed by group 50-60 (0.204 hour), followed by group 18-20 (0.201 hour), followed by groups 20-50 (0.191-0.198 hour), and the lowest average duration is for group 70-141 (0.174 hour).

Age groups vs. Ride start and End Time Hour and Day:¶

  • Average start time(hour) for age groups 60-70 & 70-141 is the earliest (12.82 PM-12.87 PM), followed by group 20-30 to 50-60 (1 PM - 1.7 PM), and the most late average start time(hour) is for group 0-20 (2.4 PM).
  • Average end time(hour) for age groups 60-70 & 70-141 is the earliest (1 PM- 1.06 PM), followed by group 20-30 to 50-60 (1.2 PM - 1.9 PM), and the most late average end time(hour) is for group 0-20 (2.5 PM).
  • Average start and end time(day) for age group 0-20 is the earliest (day 14), followed by group 20-30 to 60-70 (day 15), and the most late average start and end time(day) is for group 70-141 (16). #### Age groups vs. Ride Distance :
  • Average distance(km) for age group 30-40 is the highest(1.8km), followed by age groups 20-30 and 40-70 (1.6km-1.7km), and the lowest average distance(km) is for age groups 0-20 & 70-141(1.3km-1.5km).

Interesting relationships between other features (not the main feature(s) of interest):¶

Other additional features include ride bike share and bike id and how they are associated with main features. we can investigate them as follows:

1. Bike Share Status:¶

Bike Share vs. Ride Stations¶
  • Non-shared bikes are more than shared bikes in both start and end stations. ##### Bike Share Status vs. Member Age:
  • Average member age that share bikes during the ride is 34 years old.
  • Average member age that doesn't share bikes during the ride is 32 years old. ##### Bike Share Status vs. Duration in Seconds, Minutes, and Hours:
  • Average duration(second) for non-shared bike rides is 730 seconds and for shared rides is 684 seconds.
  • Average duration(minute) for non-shared bike rides is 12 minutes and for shared rides is 11 minutes.
  • Average duration(hour) for non-shared bike rides is 0.2 hours and for shared rides is 0.19 hour.
  • In general, average durations for non-shared bike rides are higher than that of shared bike rides.

User Type vs. bike share¶

  • 89.4% of subscribers don't share bikes during their rides and 10.6 % of subscribers share bikes during their rides, but customers don't share bikes during their rides. #### Member Gender vs. bike share
  • 90% of males don't share bikes during the ride and 10% share bikes.
  • 91% of females don't share bikes during the ride and 9% share bikes.
  • All not-defined users don't share bikes during the ride.
  • 82% of other users don't share bikes during the ride and 18% share bikes. #### Bike Share vs. Age Groups:
  • For non-shared rides , age group 30-40 is the highest (39%), followed by age group 20-30 (38%) and the lowest age group is 70-141(0.0024%).
  • For shared rides, age group 20-30 is the highest (58%), followed by age group 30-40 (13%) and the lowest age group is 70-141(0.01%).

Bike Share Status vs. Ride Start and End Time Hour and Day:¶

  • Average start time(hour) for non-shared bike rides is 1.4 PM and for shared rides is 2.1 PM.
  • Average end time (hour) for non-shared bike rides is 1.5 PM and for shared rides is 2.2 PM.
  • Average start and end time(day) for both non-shared or shared bike rides is 3.3 PM.
  • In general, start and end time(hour for non-shared bike rides is earlier than that of shared bike rides, while the average of start and end time(day) is same (3.3 PM). #### Bike Share Status vs. Bike Id and Ride Distance:
  • Average distance(km) for non-shared bike rides is 1.7km and for shared bike rides is 1.3km. Average distance shared is lower than that non-shared.
  • Average used bike id for shared bike rides is 4483 and for shared bike rides is 4379.

2. Bike Id:¶

Member Age vs. Bike Id:¶

  • There is a negative coorelation between the member age and bike ids used: as age increases, the range of bike ids used decreases. #### Ride Start and End Time Hour vs. Bike Id:
  • There is a positive coorelation between start and end time hour and range of bike ids used: as start and end time hours increase, the range of bike ids used increases from 4k-above, with most used bikes ranges from 4.5k-5.3k. #### Ride Start and End Time Day vs. Bike Id:
  • There is a positive coorelation between start and end time day and range of bike ids used: as start and end time days increase, the range of bike ids used increases from 4k-above, with most used bikes ranges from 4.5k-5.3k.

Ride Duration in Seconds, Minutes, and Hours vs. Bike Id:¶

  • There is a a weak negative coorelation between duration in seconds, minutes, and hours and bike ids used: as duration in seconds, minutes, and hours increase, the number of bike ids decreases.
  • Most bike ids are used in a duration below 10k seconds, in a duration below 200 minutes, and in a duration below 5 hours. #### Ride Distance vs. Bike Id:
  • There is a very week negative correlation between distance(km) and bike ids used: as distance increase, bike ides decreases.
  • Most ride ids are used in a distance between 0.2km-below 8km.

Member Gender vs. Bike Id:¶

  • Average used bike id for female is 4,397, for male is 4,507, for not-defined is 4,275, and for other is 4543.
  • Average used ids ranges from 4,275 to 4543. #### Weekday vs. Bike Id: -Average bike id most used for start and end time working days(Monday-Friday) ranges from 4380 to 4518 while weekend days has an average between 4575 to 4628
  • The range of used bike ids in weekend days is higher than this of working days. #### Age groups vs. Bike Id:
  • The higher average bike id most used is for age groups 20-40 (4517-4520), followed by age groups 0-20, 40-50, 50-60 & 70-141 (4331-4409), and the lowest average goes for the age group 60-70(4090).

Multivariate Exploration¶

  • In this section, I will investigate and plot multiple variables to explore the patterns, trends, models, and relationships among three or more features. The main thing I want to explore in this part of the analysis is how these variables of interest correlate and impact one anothor.

VariableS Map:¶

1. Categorical Variables:¶

  1. User characteristics: User Type, Member Gender, and Age Groups.
  2. Ride Start and End Times: Start and End Weekdays.
  3. Ride Stations: Start and End Stations
  4. Bike Share: Bike Share Status

2. Numeric Variables¶

  1. Member Age
  2. Ride Durations: Durations in Seconds, Minutes, and Hours
  3. Ride Start and End Times: Ride Start and End Times in Hour and Day
  4. Ride Distance(km)
  5. Bike Id

1. Plotting Correlation Matrix : Numeric Variables¶

Insights:¶

1.Correlation of Durations Vs. Other Numeric Variables:

  • Durations vs. Start Time(hour): Very weak positive Coorelation
  • Durations vs. End Time(hour): Neutral Coorelation
  • Durations vs. Start Time(day): Very weak positive Coorelation
  • Durations vs. End Time(day): Very weak positive Coorelation
  • Durations vs. Member Age: Very weak negative Coorelation
  • Durations vs. Distance(km): Very weak positive Coorelation
  • Durations vs. Bike Id: Very weak negative Coorelation

2.Correlation of Start & End Time(hour) vs.. Other Numeric Variables:

  • Start & End Time(hour) vs. Durations: Very weak positive Coorelation
  • Start Time(hour) vs. End Time(hour): Strong positive Coorelation
  • Start & End Time(hour) vs. Start and End Time(day): Very weak positive Coorelation
  • Start & End Time(hour) vs. Member Age: Very weak negative Coorelation
  • Start & End Time(hour) vs. Distance(km): Very weak negative Coorelation
  • Start & End Time(hour) vs. Bike Id: Very weak positive Coorelation

3.Correlation of Start & End Time(day) vs. Other Numeric Variables:

  • Start & End Time(day) vs. Durations: Very weak positive Coorelation
  • Start & End Time(day) vs. Start & End Time(hour): Very weak positive Coorelation
  • Start & End Time(day) vs. Member Age: Neutral Coorelation
  • Start & End Time(day) vs. Distance(km): Very weak positive Coorelation
  • Start & End Time(day) vs. Bike Id: Very weak positive Coorelation

4.Correlation of Member Age vs. Other Numeric Variables:

  • Member Age vs. Durations: Very weak negative Coorelation
  • Member Age vs. Start & End Time(hour): Very weak negative Coorelation
  • Member Age vs. Start & End Time(day): Neutral Coorelation
  • Member Age vs. Distance(km): Very weak positive Coorelation
  • Member Age vs. Bike Id: Very weak negative Coorelation

5.Correlation of Distance(km) vs. Other Numeric Variables:

  • Distance(km) vs. Durations: Very weak positive Coorelation
  • Distance(km) vs. Start & End Time(hour): Very weak negative Coorelation
  • Distance(km) vs. Start & End Time(day): Very weak positive Coorelation
  • Distance(km) vs. Member Age: Very weak positive Coorelation
  • Distance(km) vs. Bike Id: Very weak positive Coorelation

6.Correlation of Bike Id vs. Other Numeric Variables:

  • Bike Id vs. Durations: Very weak negative Coorelation
  • Bike Id vs. Start & End Time(hour): Very weak positive Coorelation
  • Bike Id vs. Start & End Time(day): Very weak positive Coorelation
  • Bike Id vs. Member Age: Very weak negative Coorelation
  • Bike Id vs. Bike Id: Very weak positive Coorelation

2. Faceting Many Categorical and Numeric Variables:¶

2.1 User Types and Member Gender vs. Durations (second, minute, and hour)¶

Insights:¶

Duration(second):¶

  • Avg. duration(second) for customers(1432 seconds) is higher than that of subscriber(640 seconds).
  • Avg. duration(second) for Not-defined gender(1189 seconds) is higher than that of Other gender(997 seconds), Females(779 seconds), and Males(673 seconds). #### Duration(minute):
  • Avg. duration(minute) for customers(24 minutes) is higher than that of subscriber(11 minutes).
  • Avg. duration(minute) for Not-defined gender(20 minutes) is higher than that of Other gender(17 ninutes), Females(13 minutes), and Males(11 minutes). #### Duration(hour):
  • Avg. duration(hour) for customers(0.4 hour) is higher than that of subscriber(0.18 hour).
  • Avg. duration(hour) for Not-defined gender(0.33 hour) is higher than that of Other gender(0.28 hour), Females(0.22 hour), and Males(0.19 hour).

2.2 User Types and Member Gender vs. Start and End Time Hours and Days¶

Insights:¶

Start Time(hour):¶

  • Avg. Start time(hour) for Subscriber(1.4 PM) is earlier than that of Customer(1.6 PM).
  • Avg. Start time(hour) for Female gender (1.2 PM) is earlier than that of Not-defined(1.5 PM), Male (1.52 PM), and Other(1.7 PM). #### End Time(hour):
  • Avg. End time(hour) for Subscriber(1.6 PM) is earlier than that of Customer(1.9 PM).
  • Avg. End time(hour) for Female gender (1.4 PM) is earlier than that of Male (1.7 PM), Not-defined(1.72 PM), and Other(1.8 PM). #### Start and End Time(day):
  • Avg. Start time(day) for Subscriber(day 15) is earlier than that of Customer(day 16).
  • Avg. End time(day) for is same (day 15) for all genders.

2.3 User Types and Member Gender vs. Member Age, Trip Distance(km), and Bike Ids(most-used)¶

Insights:¶

Member Age:¶

  • Avg. age for Subscriber(33 years) is higher than that of Customer(28 years).
  • Avg. age for Other gender (36 years) is higher than both Male(34 years) and Female(33 years). #### Distance(km):
  • Avg. distance(km) for customers (1.9km) is higher than that of subscribers(1.7km).
  • Avg. distance(km) for Other gender(1.9km) is higher than that of Female(1.77km), Not-defined(1.72km), and Male(1.66km). #### Bike Id:
  • Avg. most used bike id for Subscriber(4503) is higher than that of Customer(4226).
  • Avg. most used bike id for Other gender(4543) is higher than that of Male(4507), Female(4397), and Not-defined(4275).

2.4 Start and End Weekday vs. Durations (second, minute, and hour)¶

Insights:¶

Duration(second):¶

  • Avg. Duration(second) for start weekend days(903-920 seconds) is higher than that of working days(663-713 seconds).
  • Avg. Duration(second) for end weekend days(885-931 seconds) is higher than that of working days(663-714 seconds). #### Duration(minute):
  • Avg. Duration(minute) for start weekend days(15-15.3 minutes) is higher than that of working days(11-11.9 minutes).
  • Avg. Duration(minute) for end weekend days(14-15.5 minutes) is higher than that of working days(11-11.9 minutes). #### Duration(hour):
  • Avg. Duration(hour) for start weekend days(0.25-0.26 hour) is higher than that of working days(0.18-0.20 hour).
  • Avg. Duration(hour) for end weekend days(0.25-0.26 hour) is higher than that of working days(0.18-0.20 hour).

2.5 Start and End Weekday vs. Start and End Time Hours and Days¶

Insights:¶

Start & End Time(hour):¶

  • Avg. Start Time(hour) for working day (12.8-1.7 PM) is earlier than that of weekend days(1.7-2.3 PM) .
  • Avg. End Time(hour) for working day (1-1.9 PM)is earlier than that of weekend days(1.8-2.3 PM) .

Start & End Time(day):¶

  • Avg. Start and End Time(day) for working day (day 12-17) is earlier than that of weekend days(14-15).

2.6 Start and End Weekday vs. Member Age, Trip Distance(km), and Bike Ids(most-used)¶

Insights:¶

Member Age:¶

  • Avg. Member Age for start & end weekend day (30-31 years) is younger than that of working days(31-33 years)

Distance(km):¶

  • Avg. Distance(km) for start & end working day (1.7-1.72km) is higher than that of weekend days(1.6km)

Bike Id:¶

  • Avg.Bike Id for start & end weekend days(4575-4628) is higher than that of working day (4400-4518).

2.7 Bike Share Status and Age Groups vs. Durations (second, minute, and hour)¶

Insights:¶

Duration(second):¶

  • Avg. Duration(second) for non-shared bikes during the ride(730 seconds) is higher than that of shared bikes(684 seconds).
  • Avg. Duration(second) for age group 60-70 (764 seconds) is higher than other groups(626-736 seconds). The middle groups 18-60 has average of 689-736 seconds and the lower age group is 70-140(626 seconds). #### Duration(minute):
  • Avg. Duration(minute) for non-shared bikes during the ride(12 minutes) is higher than that of shared bikes(11 minutes).
  • Avg. Duration(minute) for age group 60-70 (12.73 minutes) is higher than other groups(10.4-12.26 minutes). The middle groups 18-60 has average of 11.5-12.27 minutes and the lower age group is 70-140(10.4 minutes). #### Duration(hour):
  • Avg. Duration(hour) for non-shared bikes during the ride(0.20 hour) is higher than that of shared bikes(0.19 hour).
  • Avg. Duration(hour) for age group 60-70 (0.21 hour) is higher than other groups(17-20 hour). The middle groups 18-60 have an average of 0.19-0.20 hour and the lower age group is 70-140(0.17 hour).

2.8 Bike Share Status and Age Groups vs. Start and End Time Hours and Days¶

Insights:¶

Start & End Time(hour):¶

  • Avg. Start Time(hour) for non-shared bikes(1.4 PM) is earlier than that of shared bikes(2.1 PM).
  • Avg. End Time(hour) for non-shared bikes(1.5 PM) is earlier than that of shared bikes(2.3 PM).
  • Avg. Start Time(hour) for age group 70-141(12.82 PM) is earlier than other age groups(12.87 PM-2.4 PM). The middle groups 20-70 has average of 12.9 PM-1.75 PM and the lower age group is 18-20(2.4 PM).
  • Avg. End Time(hour) for age group 70-141(1 PM) is earlier than other age groups(1.05 PM-2.5 PM). The middle groups 20-70 has average of 1.05 PM-1.9 PM and the lower age group is 18-20(2.5 PM). #### Start & End Time(day):
  • Avg. Start and End Time(day) for both non-shared bikes and shared bikes is day 15.
  • Avg. Start and End Time(day) for age groups 18-70 is day 15 and for age group 70-141 is day 17.

2.9 Bike Share Status and Age Groups vs. Member Age, Trip Distance(km), and Bike Ids(most-used)¶¶

Insights:¶

Member Age:¶

  • Avg. Member Age for non-shared bikes is 33 years and for shared bikes is 32 years. #### Distance(km):
  • Avg. Distance(km) for non-shared bikes is 1.7km and for shared bikes is 1.3km.
  • Avg. Distance(km) for age group 30-40 (1.8km) is higher than other age groups(1.3km-1.7km). The middle age groups(18-20 & 20-30 & 70-141) have an average distance of 1.5km-1.7km and the lower age group 18-20 has an average distance of 1.3km. #### Bike Id:
  • Avg. Bike Id for snon-shared bikes(4483) is higher than that of shared bikes(4379).
  • Avg. Bike Id for age group 30-40(4520) is higher than other age groups(4090-4517). The middle age groups(18-30 & 40-141) have an average bike id of 4090-4517 and the lower age group 60-70 has an average bike id of 4090.

3. Plotting Three Numeric Variables: Encoding via Size and Color¶

3.1 Member Age and Ride Distance(km) vs. Ride Start and End Times in Hours and Days¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on start and end time hours and days: as member age increases, the distance decreases.

3.2 Member Age and Ride Distance(km) vs. Durations in Seconds, Minutes, and Hours¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on duration: as member age increases, the distance over durations in seconds, minutes, and hours decreases.

3.3 Member Age and Ride Distance(km) vs. Bike Id¶

Insights:¶

  • There is a a negative coorelation between member age and distance(km) based on bike id: as member age increases, the distance over bike ids decreases.

4. Plotting Two numeric variables vs. One categorical variable: Encoding via Shape and Color for Third Categorical¶

4.1 Member Age and Ride Distance(km) vs. User Type¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on user type: as member age increases, the distance for both subscribers and customers decreases.

4.2 Member Age and Ride Distance(km) vs. Member Gender¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on member gender: as member age increases, the distance for both male, female and other genders decreases.

4.3 Member Age and Ride Distance(km) vs. Age Groups¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on age group: as member age increases, the distance for age groups decreases.

4.4 Member Age and Ride Distance(km) vs. Ride Start and End Weekdays¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on start and end time weekdays: as member age increases, the distance for weekdays decreases.

4.4 Member Age and Ride Distance(km) vs. Bike Share Status¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on bike shar status: as member age increases, the distance for both shared or non-shared bikes decreases.

5. One Categorical Variable and Two Numeric Variables¶

5.1 Member Age and Ride Distance(km) vs. User Type¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on user types of San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the distance for user types in all stations decreases.
  • Most of distance for San Fransisco & Oakland-Berkeley stations is <10km but <4km for San Jose stations. -The dominant user type in all stations is Subscriber.

5.2 Member Age and Ride Distance(km) vs. Member Gender¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on member genders of San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the distance for all member genders in all stations decreases.
  • Most of distance for San Fransisco & Oakland-Berkeley stations is <10km but <4km for San Jose stations.

5.3 Member Age and Ride Distance(km) vs. Bike Share Status¶

Insights:¶

  • There is a negative coorelation between member age and distance(km) based on Bike Share Status of San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the distance for bike share in all stations decreases.
  • Most of distance for San Fransisco & Oakland-Berkeley stations <10km but <4km for San Jose stations. -The dominant bike share status in all stations is non-shared.

5.4 Member Age and Duration vs. Ride Weekday¶

Insights:¶

  • There is a negative coorelation between member age and duration based on weekday in San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the duration in all stations decreases.

5.5 Member Age and Weekdays vs. Start Time Hour¶

Insights:¶

  • There is a positive coorelation between member age and start time hour based on weekday in San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the start time hour in all stations increases.

5. Plotting Two Categorical Variables vs. One Numeric Variable¶

5.1 User Type and Member Gender vs. Member Age¶

Insights:¶

Avg. member age per user type and member gender can be summarized as follows:

    1. Female gender is the youngest gender ( 33 year for customer and subscriber user type).
    1. Male gender is middle age( 34 years for customers and 35 for subscribers).
    1. Other gender is the eldest gender( 35 years for customers and 36 years for subscriber).

5.2 User Type and Member Gender vs. Distance(km)¶

Insights:¶

Avg. distance(km) per user type and member gender can be summarized as follows:

    1. Female, male and other genders are the highest genders in distance(km)(1.9km for customers and 1.8km for subscribers(just females and others)).
    1. Not-defined gender is has average distance of 1.7km for both customer and subscriber user types.
    1. Male gender is has the lowest distance for subscriber user type(1.6km).

5.3 User Type and Member Gender vs. Durations in Seconds, Minutes, and Hours¶

Insights:¶

Avg. Durations per user type and member gender can be summarized as follows:

    1. Customers have higher duration(second) for all genders(1254-2062 seconds) than subscribers'(216-912 seconds).
    1. Customers have higher duration(minute) for all genders(20.9-34.4 minutes) than subscribers'(10.3-15.2 minutes).
    1. Customers have higher duration(hour) for all genders(0.3-0.6 hour) than subscribers'(0.2-0.3 hour).

5.4 User Type and Member Gender vs. Start Time Hour and Day¶

Insights:¶

Avg. start time hour and day per user type and member gender can be summarized as follows:

    1. Avg. start time hour for all genders and user types is 1 PM-2 PM.
    1. Avg. start time day for all genders and user types is 15-16.

5.5 User Type and Member Gender vs. Bike Id¶

Insights:¶

Avg. Bike Id per user type and member gender can be summarized as follows:

    1. Avg. bike id for all genders and customer user type ranges from 3549 to 4418.
    1. Avg. bike id for all genders and subscriber user type ranges from 4421 to 4742.

6. Plotting Many Variables¶

6.1 User Type and Member Gender vs. numeric variables¶

Insights:¶

  • Avg. member ages per subscriber genders is higher than those of customers.
  • Avg. durations per customer genders is higher than those of subscribers.
  • Avg. start time hour and day per customer genders are higher than those of subscriber, except for the other gender as its average start time hour and day per subscriber are higher than those of customer.
  • Avg. distance(km) per customer genders is higher than those of subscribers.
  • Avg. bike id per subscriber genders is higher than those of subscribers.

6.2 User Type and Weekday vs. numeric variables¶

Insights:¶

  • Avg. member ages per subscriber weekdays is higher than those of customers.
  • Avg. durations per customer weekdays is higher than those of subscribers.
  • Avg. start time hour per customer weekdays are higher than those of subscriber, except for saturday and sunday as their average start time hour per subscriber are higher than those of customer.
  • Avg. start time day per customer weekdays are higher than those of subscriber, except for tuesday as its average start time day per subscriber are higher than that of customer.
  • Avg. distance(km) per customer weekdays is higher than those of subscribers.
  • Avg. bike id per subscriber weekdays is higher than those of subscribers.

6.3 User Type and Bike Share Status vs. numeric variables¶

Insights:¶

  • Avg. member ages per subscriber who shares/doesn't share bikes is higher than those of customers. Customers don't share bikes.
  • Avg. durations per customer who doesn't share bikes is higher than those of subscriber shares/doesn't share bikes.
  • Avg. start time hour per customer who doesn't share bikes are higher than that of subscriber who doesn't share bikes but it is lower than average start time hour of subscribers who share bikes.
  • Avg. start time day per customer who doesn't share bikes are higher than that of subscriber who share/doesn't share bikes.
  • Avg. distance(km) per customer who doesn't share bikes are higher than that of subscriber who shares/doesn't share bikes.
  • Avg. bike id per subscriber who shares/doesn't share bikes is higher than that of customer who doesn't share bikes.

Summary of Relationships between Variables of Interest:¶

I extended my investigation of categorical variables including user type, member gender, ride weekday, age groups, ride stations/city, and bike share against numeric variables including ride durations, ride times, member age, distance(km) and bike id. My goal is to explore how these factors impact the usage of GoBike service and how they correlate to one another. So, let's explore multiple variable relationships as follows:

1. Correlation Matrix for Numeric Variables:¶

  • Mumeric variables have very weak correlations, either positive or negative, or some variables have neutral correlation.

2. Two Categorical Variables and One Third Numeric Variable¶

User Types and Member Gender vs. Durations (second, minute, and hour):¶

  • Avg. duration(second) for customers(1432 seconds) is higher than that of subscriber(640 seconds).
  • Avg. duration(second) for Not-defined gender(1189 seconds) is higher than that of Other gender(997 seconds), Females(779 seconds), and Males(673 seconds).
  • Avg. duration(minute) for customers(24 minutes) is higher than that of subscriber(11 minutes).
  • Avg. duration(minute) for Not-defined gender(20 minutes) is higher than that of Other gender(17 ninutes), Females(13 minutes), and Males(11 minutes).
  • Avg. duration(hour) for customers(0.4 hour) is higher than that of subscriber(0.18 hour).
  • Avg. duration(hour) for Not-defined gender(0.33 hour) is higher than that of Other gender(0.28 hour), Females(0.22 hour), and Males(0.19 hour).

User Types and Member Gender vs. Start and End Time Hours and Days:¶

  • Avg. Start time(hour) for Subscriber(1.4 PM) is earlier than that of Customer(1.6 PM).
  • Avg. Start time(hour) for Female gender (1.2 PM) is earlier than that of Not-defined(1.5 PM), Male (1.52 PM), and Other(1.7 PM).
  • Avg. End time(hour) for Subscriber(1.6 PM) is earlier than that of Customer(1.9 PM).
  • Avg. End time(hour) for Female gender (1.4 PM) is earlier than that of Male (1.7 PM), Not-defined(1.72 PM), and Other(1.8 PM).
  • Avg. Start time(day) for Subscriber(day 15) is earlier than that of Customer(day 16).
  • Avg. End time(day) for is same (day 15) for all genders.

User Types and Member Gender vs. Member Age, Trip Distance(km), and Bike Ids:¶

  • Avg. age for Subscriber(33 years) is higher than that of Customer(28 years).
  • Avg. age for Other gender (36 years) is higher than both Male(34 years) and Female(33 years).
  • Avg. distance(km) for customers (1.9km) is higher than that of subscribers(1.7km).
  • Avg. distance(km) for Other gender(1.9km) is higher than that of Female(1.77km), Not-defined(1.72km), and Male(1.66km).
  • Avg. most used bike id for Subscriber(4503) is higher than that of Customer(4226).
  • Avg. most used bike id for Other gender(4543) is higher than that of Male(4507), Female(4397), and Not-defined(4275).

Start and End Weekday vs. Durations (second, minute, and hour):¶

  • Avg. Duration(second) for start weekend days(903-920 seconds) is higher than that of working days(663-713 seconds).
  • Avg. Duration(second) for end weekend days(885-931 seconds) is higher than that of working days(663-714 seconds).
  • Avg. Duration(minute) for start weekend days(15-15.3 minutes) is higher than that of working days(11-11.9 minutes).
  • Avg. Duration(minute) for end weekend days(14-15.5 minutes) is higher than that of working days(11-11.9 minutes).
  • Avg. Duration(hour) for start weekend days(0.25-0.26 hour) is higher than that of working days(0.18-0.20 hour).
  • Avg. Duration(hour) for end weekend days(0.25-0.26 hour) is higher than that of working days(0.18-0.20 hour).

Start and End Weekday vs. Start and End Time Hours and Days¶

  • Avg. Start Time(hour) for working day (12.8-1.7 PM) is earlier than that of weekend days(1.7-2.3 PM) .
  • Avg. End Time(hour) for working day (1-1.9 PM)is earlier than that of weekend days(1.8-2.3 PM) .
  • Avg. Start and End Time(day) for working day (day day 12-17) is earlier than that of weekend days(14-15). ### Start and End Weekday vs. Member Age, Trip Distance(km), and Bike Ids:
  • Avg. Member Age for start & end weekend day (30-31 years) is younger than that of working days(31-33 years)
  • Avg. Distance(km) for start & end working day (1.7-1.72km) is higher than that of weekend days(1.6km)
  • Avg.Bike Id for start & end weekend days(4575-4628) is higher than that of working day (4400-4518).

Bike Share Status and Age Groups vs. Durations (second, minute, and hour):¶

  • Avg. Duration(second) for non-shared bikes during the ride(730 seconds) is higher than that of shared bikes(684 seconds).
  • Avg. Duration(second) for age group 60-70 (764 seconds) is higher than other groups(626-736 seconds). The middle groups 18-60 has average of 689-736 seconds and the lower age group is 70-140(626 seconds). Avg. Duration(minute) for non-shared bikes during the ride(12 minutes) is higher than that of shared bikes(11 minutes).
  • Avg. Duration(minute) for age group 60-70 (12.73 minutes) is higher than other groups(10.4-12.26 minutes). The middle groups 18-60 has average of 11.5-12.27 minutes and the lower age group is 70-140(10.4 minutes).
  • Avg. Duration(hour) for non-shared bikes during the ride(0.20 hour) is higher than that of shared bikes(0.19 hour).
  • Avg. Duration(hour) for age group 60-70 (0.21 hour) is higher than other groups(0.17-0.20 hour). The middle groups 18-60 have an average of 0.19-0.20 hour and the lower age group is 70-140(0.17 hour).

Bike Share Status and Age Groups vs. Start and End Time Hours and Days:¶

  • Avg. Start Time(hour) for non-shared bikes(1.4 PM) is earlier than that of shared bikes(2.1 PM).
  • Avg. End Time(hour) for non-shared bikes(1.5 PM) is earlier than that of shared bikes(2.3 PM).
  • Avg. Start Time(hour) for age group 70-141(12.82 PM) is earlier than other age groups(12.87 PM-2.4 PM). The middle groups 20-70 has average of 12.9 PM-1.75 PM and the lower age group is 18-20(2.4 PM).
  • Avg. End Time(hour) for age group 70-141(1 PM) is earlier than other age groups(1.05 PM-2.5 PM). The middle groups 20-70 has average of 1.05 PM-1.9 PM and the lower age group is 18-20(2.5 PM).
  • Avg. Start and End Time(day) for both non-shared bikes and shared bikes is day 15.
  • Avg. Start and End Time(day) for age groups 18-70 is day 15 and for age group 70-141 is day 17.

Bike Share Status and Age Groups vs. Member Age, Trip Distance(km), and Bike Ids:¶

  • Avg. Member Age for non-shared bikes is 33 years and for shared bikes is 32 years.
  • Avg. Distance(km) for non-shared bikes is 1.7km and for shared bikes is 1.3km.
  • Avg. Distance(km) for age group 30-40 (1.8km) is higher than other age groups(1.3km-1.7km). The middle age groups(18-20 & 20-30 & 70-141) have an average distance of 1.5km-1.7km and the lower age group 18-20 has an average distance of 1.3km.
  • Avg. Bike Id for snon-shared bikes(4483) is higher than that of shared bikes(4379).
  • Avg. Bike Id for age group 30-40(4520) is higher than other age groups(4090-4517). The middle age groups(18-30 & 40-141) have an average bike id of 4090-4517 and the lower age group 60-70 has an average bike id of 4090.

3. Three Numeric Variables:¶

Member Age and Ride Distance(km) vs. Ride Start and End Times in Hours and Days:¶

  • There is a negative coorelation between member age and distance(km) based on start and end time hours and days: as member age increases, the distance decreases. ### Member Age and Ride Distance(km) vs. Durations in Seconds, Minutes, and Hours:
  • There is a negative coorelation between member age and distance(km) based on duration: as member age increases, the distance over durations in seconds, minutes, and hours decreases. ### Member Age and Ride Distance(km) vs. Bike Id:
  • There is a a negative coorelation between member age and distance(km) based on bike id: as member age increases, the distance over bike ids decreases.

4. Two Numeric Variables vs. One Categorical Variable:¶

Member Age and Ride Distance(km) vs. User Type:¶

  • There is a negative coorelation between member age and distance(km) based on user type: as member age increases, the distance for both subscribers and customers decreases. ### Member Age and Ride Distance(km) vs. Member Gender:
  • There is a negative coorelation between member age and distance(km) based on member gender: as member age increases, the distance for both male, female and other genders decreases. ### Member Age and Ride Distance(km) vs. Age Groups:
  • There is a negative coorelation between member age and distance(km) based on age group: as member age increases, the distance for age groups decreases.

Member Age and Ride Distance(km) vs. Ride Start and End Weekdays:¶

  • There is a negative coorelation between member age and distance(km) based on start and end time weekdays: as member age increases, the distance for weekdays decreases. ### Member Age and Ride Distance(km) vs. Bike Share Status:
  • There is a negative coorelation between member age and distance(km) based on bike shar status: as member age increases, the distance for both shared or non-shared bikes decreases.

5. One Categorical Variable and Two Numeric Variables¶

Member Age and Ride Distance(km) vs. User Type: San Fransisco, Oakland-Berkeley, and San Jose Stations¶

  • There is a negative coorelation between member age and distance(km) based on user types of San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the distance for user types in all stations decreases.
  • Most of distance for San Fransisco & Oakland-Berkeley stations is <10km but <4km for San Jose stations. -The dominant user type in all stations is Subscriber. ### Member Age and Ride Distance(km) vs. Member Gender: San Fransisco, Oakland-Berkeley, and San Jose Stations
  • There is a negative coorelation between member age and distance(km) based on member genders of San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the distance for all member genders in all stations decreases.
  • Most of distance for San Fransisco & Oakland-Berkeley stations is <10km but <4km for San Jose stations.
Member Age and Ride Distance(km) vs. Bike Share Status: San Fransisco, Oakland-Berkeley, and San Jose Stations¶
  • There is a negative coorelation between member age and distance(km) based on Bike Share Status of San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the distance for bike share in all stations decreases.
  • Most of distance for San Fransisco & Oakland-Berkeley stations <10km but <4km for San Jose stations. ##### Member Age and Duration vs. Ride Weekday: San Fransisco, Oakland-Berkeley, and San Jose Stations
  • There is a negative coorelation between member age and duration based on weekday in San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the duration in all stations decreases. ##### Member Age and Weekdays vs. Start Time Hour: San Fransisco, Oakland-Berkeley, and San Jose Stations
  • There is a positive coorelation between member age and start time hour based on weekday in San Fransisco, Oakland-Berkeley, and San Jose stations: as member age increases, the start time hour in all stations increases.

User Type and Member Gender vs. Member Age:¶

Avg. member age per user type and member gender can be summarized as follows:

  • Female gender is the youngest gender( 33 year for customer and subscriber user type).
  • Male gender is middle age( 34 years for customers and 35 for subscribers).
  • Other gender is the eldest gender( 35 years for customers and 36 years for subscriber). ### User Type and Member Gender vs. Distance(km): Avg. distance(km) per user type and member gender can be summarized as follows:
  • Female, male and other genders are the highest genders in distance(km)(1.9km for customers and 1.8km for subscribers(just females and others)).
  • Not-defined gender is has average distance of 1.7km for both customer and subscriber user types.
  • Male gender is has the lowest distance for subscriber user type(1.6km).

6. Two Categorical Variables vs. One Numeric Variable:¶

User Type and Member Gender vs. Member Age:¶

Avg. member age per user type and member gender can be summarized as follows:

  • Female gender is the youngest gender( 33 year for customer and subscriber user type).
  • Male gender is middle age( 34 years for customers and 35 for subscribers).
  • Other gender is the eldest gender( 35 years for customers and 36 years for subscriber). ### User Type and Member Gender vs. Distance(km): Avg. distance(km) per user type and member gender can be summarized as follows:
  • Female, male and other genders are the highest genders in distance(km)(1.9km for customers and 1.8km for subscribers(just females and others).
  • Not-defined gender is has average distance of 1.7km for both customer and subscriber user types.
  • Male gender is has the lowest distance for subscriber user type(1.6km).
User Type and Member Gender vs. Durations in Seconds, Minutes, and Hours:¶

Avg. Durations per user type and member gender can be summarized as follows:

  • Customers have higher duration(second) for all genders(1254-2062 seconds) than subscribers'(216-912 seconds).
  • Customers have higher duration(minute) for all genders(20.9-34.4 minutes) than subscribers'(10.3-15.2 minutes).
  • Customers have higher duration(hour) for all genders(0.3-0.6 hour) than subscribers'(0.2-0.3 hour). ##### User Type and Member Gender vs. Start Time Hour and Day: Avg. start time hour and day per user type and member gender can be summarized as follows:
  • Avg. start time hour for all genders and user types is 1 PM-2 PM.
  • Avg. start time day for all genders and user types is 15-16. ##### User Type and Member Gender vs. Bike Id: Avg. Bike Id per user type and member gender can be summarized as follows:
  • Avg. bike id for all genders and customer user type ranges from 3549 to 4418.
  • Avg. bike id for all genders and subscriber user type ranges from 4421 to 4742.

7. Many Variables¶

User Type and Member Gender vs. All Numeric Variables:¶

  • Avg. member ages per subscriber genders is higher than those of customers.
  • Avg. durations per customer genders is higher than those of subscribers.
  • Avg. start time hour and day per customer genders are higher than those of subscriber, except for the other gender as its average start time hour and day per subscriber are higher than those of customer.
  • Avg. distance(km) per customer genders is higher than those of subscribers.
  • Avg. bike id per subscriber genders is higher than those of subscribers.

User Type and Weekday vs. All Numeric Variables:¶

  • Avg. member ages per subscriber weekdays is higher than those of customers.
  • Avg. durations per customer weekdays is higher than those of subscribers.
  • Avg. start time hour per customer weekdays are higher than those of subscriber, except for saturday and sunday as their average start time hour per subscriber are higher than those of customer.
  • Avg. start time day per customer weekdays are higher than those of subscriber, except for tuesday as its average start time day per subscriber are higher than that of customer.
  • Avg. distance(km) per customer weekdays is higher than those of subscribers.
  • Avg. bike id per subscriber weekdays is higher than those of subscribers.

User Type and Bike Share Status vs. All Numeric Variables:¶

  • Avg. member ages per subscriber who shares/doesn't share bikes is higher than those of customers. Customers don't share bikes.
  • Avg. durations per customer who doesn't share bikes is higher than those of subscriber shares/doesn't share bikes.
  • Avg. start time hour per customer who doesn't share bikes are higher than that of subscriber who doesn't share bikes but it is lower than average start time hour of subscribers who share bikes.
  • Avg. start time day per customer who doesn't share bikes are higher than that of subscriber who share/doesn't share bikes.
  • Avg. distance(km) per customer who doesn't share bikes are higher than that of subscriber who shares/doesn't share bikes.
  • Avg. bike id per subscriber who shares/doesn't share bikes is higher than that of customer who doesn't share bikes.

Most Interesting or surprising Interactions between Features:¶

Some interesting or surprising interactions may include:

  • Most of numeric variables have very weak correlations, except for start time hour and end time hour: they have very strong correlation (0.98).
  • In general, there is a negative correlation between member age and distance(km) based on start and end time hours, between member age and distance(km) based on duration, between member age and distance(km) based on user type, between member age and distance(km) based on member gender, between member age and distance(km) based on age group, between member age and distance(km) based on start and end time weekdays, and between member age and distance(km) based on bike shar status.
  • Moreover, there is a negative correlation in San Fransisco, Oakland-Berkeley, and San Jose stations between member age and distance(km) based on user types, between member age and distance(km) based on member genders, between member age and distance(km) based on Bike Share Status, between member age and duration based on weekday, and between member age and start time hour based on weekday.
  • Member age and distance(km) have a negative correlation with start and end time hour and bike id has a negative correlation with durations and member age. Other numeric variables hav a very weak or even neutral correlation.
  • Despite subscribers constitute 89% of total rides, avg. duration(second) for customers(1432 seconds) is higher than that of subscriber(640 seconds). Also, avg. duration(minute) for customers(24 minutes) is higher than that of subscriber(11 minutes). More interesting, avg. duration(hour) for customers(0.4 hour) is higher than and roughly double that of subscriber(0.18 hour).
  • Customers have higher duration(minute) for all genders(20.9-34.4 minutes) than subscribers'(10.3-15.2 minutes). Similarly, Customers have higher duration(hour) for all genders(0.3-0.6 hour) than subscribers'(0.2-0.3 hour).
  • Females' avg. duration(second)(779 seconds) are higher in duration than that of Males(673 seconds), Females' avg. duration(minute)(13 minutes) is also higher than that of Males(11 minutes). More interesting, Females' duration(hour)(0.22 hour) is higher than that of Males(0.19 hour).
  • Avg. start time hour for all genders and user types is 1PM-2PM.
  • Avg. start time day for all genders and user types is 15-16.
  • Avg. start time(hour) for Subscriber(1.4 PM) is earlier than that of Customer(1.6 PM) and avg. end time(hour) for Subscriber(1.6 PM) is earlier than that of Customer(1.9 PM).
  • Avg. age for Subscriber(33 years) is higher than that of Customer(28 years).
  • Avg. age for Male(34 years) is higher than that of Female(33 years).
  • Avg. distance(km) for customers (1.9km) is higher than that of subscribers(1.7km).
  • Avg. distance(km) for Female(1.77km) is higher than that of Male(1.66km).
  • Avg. Duration(second) for start weekend days(903-920 seconds) is higher than that of working days(663-713 seconds).
  • Avg. Duration(minute) for start weekend days(15-15.3 minutes) is higher than that of working days(11-11.9 minutes).
  • Avg. Duration(hour) for start weekend days(0.25-0.26 hour) is higher than that of working days(0.18-0.20 hour).
  • Avg. Start Time(hour) for working day (12.8-1.7 PM) is earlier than that of weekend days(1.7-2.3 PM) .
  • Avg. Member Age for start & end weekend day (30-31 years) is younger than that of working days(31-33 years)
  • Avg. Distance(km) for start & end working day (1.7-1.72km) is higher than that of weekend days(1.6km)
  • Avg. Duration(minute) for age group 60-70 (12.73 minutes) is higher than other groups(10.4-12.26 minutes). The middle groups 18-60 has average of 11.5-12.27 minutes and the lower age group in duration is 70-140(10.4 minutes).
  • Avg. Duration(hour) for age group 60-70 (0.21 hour) is higher than other groups(17-20 hour). The middle groups 18-60 have an average of 0.19-0.20 hour and the lower age group is 70-140(0.17 hour).
  • Avg. Start Time(hour) for age group 70-141(12.82 PM) is earlier than other age groups(12.87 PM-2.4 PM). The middle groups 20-70 has average of 12.9 PM-1.75 PM and the lower age group is 18-20(2.4 PM).

Summary¶

In conclusion, Ford GoBike service dataset for the month of February consists of 183,412 rides and subscriber users have most of the rides at 89% of totoal rides while customers make 11% of the total rides. Findings can be summarized as follows:

Durations:¶

Customers have higher duration(minute) for all genders(20.9-34.4 minutes) than subscribers'(10.3-15.2 minutes) and they have higher duration(hour) for all genders(0.3-0.6 hour) than subscribers'(0.2-0.3 hour) and females' duration(hour)(0.22 hour) is higher than that of males(0.19 hour).

Start/End Times:¶

Avg. start time(hour) for working day (12.8-1.7 PM) is earlier than that of weekend days(1.7-2.3 PM). Avg. start time hour for all genders and user types is 1PM-2PM. Avg. start time(hour) for subscriber(1.4 PM) is earlier than that of customer(1.6 PM) and avg. start time(hour) for Female gender (1.2 PM) is earlier than that of Male (1.52 PM).

Member Age:¶

Avg. age for subscriber(33 years) is higher than that of customer(28 years) and avg. age for males(34 years) is higher than that of females(33 years).

Distance:¶

Avg. distance(km) for customers (1.9km) is higher than that of subscribers(1.7km) and avg. distance(km) for females(1.77km) is higher than that of males(1.66km). Avg. distance(km) for start & end working day (1.7-1.72km) is higher than that of weekend days(1.6km)

Weekdays:¶

Average duration(minute) for working days(Monday-Friday) ranges from 11.1 to 11.9 minuts while weekend days has an average which ranges between 15 and 15.3 minuts and average duration(hour) for working days(Monday-Friday) ranges from 0.18 to 0.2 hour while weekend days has an average which between 0.25 and 0.26 hour. Average start time (hour) for working days(Monday-Friday) ranges from 12.8 PM to 1.7 PM while weekend days has an average which between 1.7 to 2.2 PM. Average distance(km) for start and end time working days(Monday-Friday) ranges from 1.67km to 1.73km while weekend days has an average of 1.6km.

Bike Share:¶

Avg. Duration(minute) for non-shared bikes during the ride(12 minutes) is higher than that of shared bikes(11 minutes). Avg. distance(km) for non-shared bikes is 1.7km and for shared bikes is 1.3km.

Age Groups:¶

Avg. Duration(minute) for age group 60-70 (12.73 minutes) is higher than that other groups(10.4-12.26 minutes). The middle groups 18-60 has average of 11.5-12.27 minutes and the loweest age group is 70-140(10.4 minutes). Avg. Distance(km) for age group 30-40 (1.8km) is higher than other age groups(1.3km-1.7km). The middle age groups(18-20 & 20-30 & 70-141) have an average distance of 1.5km-1.7km and the lowest age group 18-20 has an average distance of 1.3km.

Stations/City:¶

San Fransisco has 156 stations, Oakland_Berkeley have 127 stations, and San Jose has 47 stations. Avg. number of rides per station in San Fransisco is , in Oakland_Berkeley is 1,480, and in San Jose is 295. Average member age for San Fransisco 's Sations is 33 years old, for Oakland_Berkeley 's Sations is 34 years old, for San Jose 's Sations is 31 years old. Average duration (hour) for San Fransisco 's sations is 0.23 hour, for Oakland_Berkeley and San Jose's stations is 0.21 hour. Average distance(km) for San Fransisco 's sations is 1.9km, for Oakland_Berkeley and San Jose 's stations is 1.7km.

Sources¶

I have used the following sources in my exploration:

  • Wikipedia and BikeShare.cc - Introduction about Ford GoBike.
  • Github.io - Handling Missing Values
  • Github.com - Haversine Formula - Calculate distance between latitude longitude pairs with Python
  • Nbconvert - Exclude input cell code with TemplateExporter